-
-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locality-aware chunk distribution #3102
base: main
Are you sure you want to change the base?
Conversation
Please take a moment and address the merge conflicts of your pull request. Thanks! |
130ef17
to
0f64189
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see a scenario where we end up with one ApplyTask instance effectively only utilising one thread for a 512x512 region of the edit, causing the whole edit to wait for it when everything else is finished. I don't know if there's a particular way around that, but in a complex operation this could be quite a perceived performance hit
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ApplyTask.java
Outdated
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ApplyTask.java
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ApplyTask.java
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ApplyTask.java
Outdated
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ApplyTask.java
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ApplyTask.java
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ApplyTask.java
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/QueueHandler.java
Show resolved
Hide resolved
worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/QueueHandler.java
Show resolved
Hide resolved
Yes, that's possible in theory, and the heuristic when to split is rather simple currently. In practice, it would mean the thread pool is already busy and there are even more tasks waiting. We could e.g. force splitting on 512x512 regions, or check I also though about re-checking the heuristics during processing, but that complicates things too much I think. |
If we are processing a larger region by smaller region at a time in the first place, rather than the for x; for z I think it would be much more reasonable to check if we want to parallelise further at the end of the completion of each subregion, and can therefore submit the rest. E.g. create a queue at the start of a region's processing and just submit the remaining as/when needed |
Hm I guess processing chunks using a hilbert curve might be useful there. Otherwise, just always splitting and then unforking is basically that, and the overhead of creating that many tasks eagerly is most likely not a problem. |
I reworked the heuristic to directly process a region a bit:
With this design, I think it is very unlikely that we process overly large regions that take more time than what the pool can process currently. We can explore more dynamic approaches in future, but I'd like to start with a mainly simple solution. |
I suppose it's hard to make much judgement without some kinda of usage statistics from servers with a lot of players on. I don't know if we would want to effectively enforce single-threaded operation per edit. For now it's probably fine I suppose, but worth seeing if we can keep an eye on this in bstats somehow? |
It's generally difficult to tell for me how people are using FAWE (which patterns, masks, brushes, typical edit sizes, etc) so I'd like to explore some kind of observability that can be shared (probably similar to |
Yeah I think the groundwork here is probably worth a merge for now. As it is I don't know that large edits would be done very often, and I do also wonder if perhaps for small edits (<ncpu chunks) we should just throw it in single-thread anyway |
Overview
Description
ParallelQueueExtent
deals with distributing work onto multiple threads to speed up edits. In its current implementation, this is achieved by just using theRegion#getChunks().iterator()
as a queue and let all threads pull the next chunk to work on from there. This works well typically, but it has one downside: Different threads work on neighboring chunks. There are multiple reasons this isn't optimal:STQE
. As currently this also means that oneSTQE
will trim the chunk cached/processed by anotherSTQE
, we do a lot of extra work in such cases.By distributing work more location-aware, these downsides can be prevented, at the cost of a somewhat more complicated work-splitting logic. This means threads will process more tasks, but those tasks are smaller and faster to process. As an additional benefit, this allows better work distribution if multiple edits are done concurrently, as the threads don't have to finish one edit before they are free again for the next edit. Consequently, thread-local data needs to be cleared and re-set more often.
Exception handling is a bit of a question mark with this approach, I tried to get it near to the original handling, but I'm not sure if it makes sense the way it is right now.
Submitter Checklist